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Abstract. Neural networks are utilized to fit Compton form factor H to HERMES data on 
deeply virtual Compton scattering off unpolarized protons. We used this result to predict the 
beam charge-spin assymetry for muon scattering off proton at the kinematics of the COMPASS 
II experiment. 



1. Introduction 

Deeply virtual Compton Scattering (DVCS) is recognized as the theoretically cleanest process for 
accessing generalized parton distributions (GPDs) [11-01) which describe the three-dimensional 
structure of nucleon in terms of partonic degrees of freedom. Determination of GPDs, beside 
improving our general understanding of QCD dynamics, allows to address important questions 
such as the partonic decomposition of the nucleon spin ^] and characterization of multiple-hard 
reactions in proton-proton collisions at LHC collider 0]. Concerning the latter, GPD-describable 
non-trivial transversal structure of proton, such as the correlation between parton's longitudinal 
momentum fraction and its transversal distance is already finding its way in popular event 



Similarly to extraction of normal parton distribution functions (PDFs), extraction of GPDs 
can be performed by global model or local fits to available data [tHI^. However, compared to 
global PDF fits, extracting GPDs from data is a much more intricate task and model ambiguities 
are much larger. Due to the facts that GPDs cannot be fully constrained from data and that 
they depend at the input scale on three variables, the space of possible functions, although 
restricted by GPD constraints, is huge. As a result, the theoretical systematic error induced by 
the choice of the fitting model is much more serious than in the PDF case, where the model 
functions depend at the input scale only on one variable, namely, the longitudinal momentum 
fraction x. 

Here we report on some results obtained using alternative approach (131] . in which neural 
networks are used in place of specific models. This essentially eliminates the problem of model 
dependence and, as an additional advantage, facilitates a convenient method to propagate 
uncertainties from experimental measurements to the final result. Our approach is mostly similar 
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to the one already employed for F2 structure function and PDF extraction 13- 3] and will be 
shortly described in the next section. To reduce the mathematical complexity of the problem 
we have fitted not the GPDs itself, but the dominant Compton form factor (CFF) 'H{xB,t), 
depending on Bjorken variable xb and momentum transfer t. At leading order, the imaginary 
part of this CFF is related to the corresponding GPD H{x, ^, t) at the cross-over line x = ^: 
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(1) 



so knowledge of CFF Ti provides us with direct information about the proton structure. 



2. The Method of Fitting Data with Neural Netvi^orks 
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Figure 1. A neural network parametrization of the complex- valued CFF T-L(xB,t). Each blob 
symbolizes a neuron and thickness of arrows represents the strengths of weights wj. 

The neural network type used in this work, known as multilayer perceptron, is a mathematical 
structure consisting of a number of interconnected "neurons" organized in several layers. It is 
schematically shown on Figure [TJ where each blob symbolizes a single neuron. Each neuron 
has several inputs and one output. The value at the output is given as a function fC^j WjXj) 
of a sum of values at inputs xi,X2, • • • , each weighted by a certain number Wj. For activation 
function f{x) we employed logistic sigmoid function 

/(x) = l/(l + exp(-x)) 

for neurons in inner ("hidden") layer, while for input and output layers we used the identity 
function. 

By iterating over the following steps the network is trained, i.e., it "learns" how to describe 
a certain set of data points: 

(i) Kinematic values (two in our case: xb and t) of the first data point are presented to two 
input-layer neurons 

(ii) These values are then propagated through the network according to values of weights Wj . 
In the first iteration, weights are set to some random value. 



(iii) As a result, the network produces some resulting values of output in its output-layer neurons. 
Here we have two: JmT-l and d^eV.. Obviously, after the first iteration, these will be some 
random functions of the input kinematic values: Jm7i{x B,t) and ^e'H{xB,t). 

(iv) Using these values of CFF(s), the observable corresponding to the first data point is 
calculated and it is compared to actually measured value, with squared error used for 
building the standard function. 

(v) The obtained error is then used to modify the network: It is propagated backwards through 
the layers of the network and each weight is adjusted such that this error is decreased. 

(vi) This procedure is then repeated with the next data point, until the whole training set is 
exhausted. 

This sequence of step^ is repeated until the network is capable to describe experimental data 
with a sufficient accuracy. To guard against overfitting the data ("fitting to the noise"), one 
(randomly chosen) subset of data is not used for training but only for monitoring the progress 
and stopping the training when error of network description of this data starts to increase 
significantly. This ensures that resulting neural network represents a function which is not too 
complex and which is thus expected to provide a reasonable estimate of the actual underlying 
physical law. 

To propagate experimental uncertainties into the final result we used the "Monte Carlo" 
method [171] where neural networks are not trained on actual data but on a collection of "replica 
data sets". These sets are obtained from original data by generating random artificial data 
points according to Gaussian probability distribution with a width defined by the error bar 
of experimental measurements. Taking a large number N^ep of such replicas, the resulting 
collection of trained neural networks T-L^^\ . . . ^"}{(^rep) defines a probability distribution VlTi] of 
the represented CFF T-L{xB,t) and of any functional J-lH] thereof. Thus, the mean value of such 



a functional and its variance are [17|, Il4l | 



/ Nrep 
vnv[n]T[H] = -—Y,m'^'^], (2) 

AT[n]Y = (nnf) - (TmY . (3) 



More details about our procedure can be found in [1 
3. First results 

We now present neural network fits [13] to two sets of HERMES collaboration measurements 



181 ] of leptoproduction of a real photon by scattering leptons off unpolarized protons (of which 
DVCS is a subprocess). One set consists of 18 measurements of the first sine harmonic A^^^'^ of 
the beam spin asymmetry (BSA) 

dcJet — dcjgi 



(where (j) is the azimuthal angle in the so-called Trento convention), while in the other set there 

iCOf 



are 18 measurements of the first cosine harmonic A'^''^ of the beam charge asymmetry (BCA) 



A d(Tg+ d(Tg- .COSO(A I ACOsd) I /r\ 

BCA = - ; ~ ^ + Ar< COS (p. (5) 

d(Te+ + dcTg- C C ^ W 

The described procedure, where the neural network is modified after each addition of a data point is known as 
sequential learning. In the alternative procedure, batch learning, the network is modified only after the complete 
training set is presented to the network and the total error is calculated, i.e., last two steps of the above procedure 
are reversed. We tried both types of learning and batch learning turned out to be more robust. 
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Figure 2. First cosine harmonic of beam charge asymmetry ^^^"^ ^ and first sine harmonic 

of beam spin asymmetry ^^Ic/"^ © resulting from neural network fit, shown together with data 
[18] . used for training. 

Both sets cover the identical kinematic region 

0.05 <XB< 0.24 , 0.02 < -t/GeV^ < 0.46 , and 1.2 < Q^/GeV"^ < 6.11 , 

and in this region BSA and EGA are determined essentially by the imaginary and the real part 
of the Compton form factor Ti [l^, respectively, so we ignored other CFFs. Furthermore, in this 
region the dependence of T-L on the photon virtuality is weak and, therefore, we neglected 
it for simplicity. Thus, at present, just a single OFF 7i{xB,t), or two real-valued functions 
3m'H{xB,t) and ^Re7i{xB,t), are extracted from data by neural networks. 

We fitted this data using the method described in the previous section where we constructed 
50 neural networks with architecture (2-13-2), i.e., with two input neurons (for two kinematic 
variables xb and t), 13 neurons in the hidden layer (where we convinced ourselves that 
adding or removing few neurons doesn't significantly change the results), and 2 output neurons 
(representing 3m'H{xB,t) and 9\e7i{xB,t)). On Figure [2] we show the fit quality by presenting 
the data used for training together with the description of this data by the final set of 50 neural 
networks, using relations ([2]) and ([3|). 

CFF 71 itself, which is our main result, is displayed in Figure [3l where JmH (upper panels) 
and D^eH (lower panels) are separately plotted. One notices that in the kinematic region of 
the measured data (this is roughly the middle vertical third of the left panels), where neural 
networks are interpolating the data, CFF T-L is estimated with a reasonably small uncertainty. 
However, as one starts to extrapolate the fitted CFF H outside of the data region (left and right 
thirds of left panels and the whole of the right panels), the neural network parameterization of 
CFF H is very unconstrained. This is particularly visible in the right panels, illustrating the 
difficulty of a model-independent extrapolation towards t = 0, which is a limit of particular 
interest for hadron structure studies. 




Figure 3. Neural network extraction of 3m W(xBj , t) and mzn{xBj , t) from HERMES BCA and BSA [18| 
data. Actual data region is shown as vertical band in the middle of the left two panels. Outside of this 
band and on the whole of the right two panels, neural networks are extrapolating from the data. 



Finally, as an example of a proper prediction coming from our analysis we plot in Figure U] 
the beam charge-spin asymmetry (BCSA), 



BCSA 



(6) 



as a function of momentum transfer t, for several kinematic points that are characteristic for the 
COMPASS II experiment of scattering muons and antimuons off proton target (where the muon 
is taken to be massless and the polarization is set equal to 0.8). This experiment was chosen 
because its kinematics overlap with that of the HERMES data used for neural network training. 
Hence these predictions represent partly interpolation and partly extrapolation of HERMES 
data, thus testing this whole approach in a nontrivial way. 



4. Conclusions and outlook 

By explicit extraction of Compton form factor T-L from HERMES data on beam spin and charge 
asymmetries we demonstrated that neural networks can be a powerful tool for studying hadron 
structure. They can interpolate experimental data in an unbiased way, eliminating thus the 
systematic error introduced by choosing a specific fitting function in the standard model-fitting 
approaches. Since GPDs and CFFs are multivariate functions, this advantage is much more 
pronounced than in PDF fitting, where PDFs on the input scale depend only on a single 
variable. Still, this feature of neural network approach cuts both ways: unbiased fitting of 
GPDs and CFFs to the precision of the present PDF fits would require orders of magnitude 
more data then presently available — to cover the larger dimension of the kinematic space. To 
overcome this problem, one may deliberately introduce some biases and constraints on neural 




networks, especially those that correspond to certain well established properties of represented 
functions (e.g. dispersion relations between imaginary and real parts of CFFs a am, Hi). 
Furthermore, one could view neural network fits as intermediate results and use them as a tool 
for model-dependent studies. 
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